Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression

Wedescribe Phrasal Rank-Encoding (PR-Enc), a novel method for the compression of wordaligned target language data in phrase tables as used in phrase-based SMT. This method reduces the redundancy in phrase tables which is a direct effect of the phrase-based approach. A combination of PR-Enc with Huffman coding allows to reduce the size of an aggressively compressed phrase table by another 39 per...

متن کامل

A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression

This paper describes the first steps towards a minimum-size phrase table implementation to be used for phrase-based statistical machine translation. The focus lies on the size reduction of target language data in a phrase table. Rank Encoding (REnc), a novel method for the compression of word-aligned target language in phrase tables is presented. Combined with Huffman coding a relative size red...

متن کامل

Hierarchical Phrase Table Combination for Machine Translation

Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. ...

متن کامل

TmTriangulate: A Tool for Phrase Table Triangulation

This work was supported by the grants no 645452 (QT21) and no 644402 (HimL) of the EU and SVV 260 104 of the Czech Republic. We used language resources hosted by the LINDAT/CLARIN project LM2010013 of the Ministry of Education, Youth and Sports. Introduction Under-resourced language pair: Scarcity of parallel corpora SMT Problem: No direct data → no SMT training Insufficient data → poor SMT per...

متن کامل

Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

In this work, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall. We present a generic phrase training algorithm which is parameterized with feature functions and can be optimized jointly with the translation engine to directly maximize the end-to-end system performance. Mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Prague Bulletin of Mathematical Linguistics

سال: 2012

ISSN: 1804-0462,0032-6585

DOI: 10.2478/v10108-012-0009-6